Skip to content

Conversation

lslusarczyk
Copy link
Contributor

@lslusarczyk lslusarczyk commented Sep 11, 2025

@lslusarczyk lslusarczyk requested a review from a team as a code owner September 11, 2025 12:30
@bratpiorka bratpiorka requested review from lplewa and ldorau September 11, 2025 12:53
Copy link
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 / 16 files reviewed (will be continued)

@lplewa
Copy link
Contributor

lplewa commented Sep 19, 2025

I still oppose against adding provider specific code to ops interface.
If you don't want to do it thru CTL just implement it as umfLevelZeroMemoryProviderResidentDeviceChange() in provider_level_zero.h which do not go throught OPS. This function will act as an "level zero specific api extension".

note: remember that you will have to call umfMemoryProviderGetPriv() in this function to retrieve ze_memory_provider_t from provider handle


static void init_ze_global_state(void) {

char *lib_name = getenv("UMF_ZE_LOADER_LIB_NAME");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this thru CTL please.

We created CTL to limit new envvariables.

Copy link
Contributor Author

@lslusarczyk lslusarczyk Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for tests only. Env variables are simple to write and to understand for future maintainers. Could you please accept it as is in this PR? But I am OK if anyone wants to change it to CTL after this PR is merged.


// TODO: add assertions to UMF and change it to be an assertion
if (info->props.base != (void *)key) {
LOG_ERR("key:%p is different than base:%p", (void *)key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_FATAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, applied

Comment on lines 27 to 45
#define LOG_DEBUG(...) utils_log(LOG_DEBUG, __func__, __VA_ARGS__);
#define LOG_INFO(...) utils_log(LOG_INFO, __func__, __VA_ARGS__);
#define LOG_WARN(...) utils_log(LOG_WARNING, __func__, __VA_ARGS__);
#define LOG_ERR(...) utils_log(LOG_ERROR, __func__, __VA_ARGS__);
#define LOG_FATAL(...) utils_log(LOG_FATAL, __func__, __VA_ARGS__);

#define LOG_PDEBUG(...) utils_plog(LOG_DEBUG, __func__, __VA_ARGS__);
#define LOG_PINFO(...) utils_plog(LOG_INFO, __func__, __VA_ARGS__);
#define LOG_PWARN(...) utils_plog(LOG_WARNING, __func__, __VA_ARGS__);
#define LOG_PERR(...) utils_plog(LOG_ERROR, __func__, __VA_ARGS__);
#define LOG_PFATAL(...) utils_plog(LOG_FATAL, __func__, __VA_ARGS__);
#ifdef UMF_DEVELOPER_MODE
#define UMF_STRINGIFY(x) #x
#define UMF_TOSTRING(x) UMF_STRINGIFY(x)
#define UMF_FUNC_DESC() __FILE__ ":" UMF_TOSTRING(__LINE__)
#else
#define UMF_FUNC_DESC() __func__
#endif

#define LOG_DEBUG(...) utils_log(LOG_DEBUG, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_INFO(...) utils_log(LOG_INFO, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_WARN(...) utils_log(LOG_WARNING, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_ERR(...) utils_log(LOG_ERROR, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_FATAL(...) utils_log(LOG_FATAL, UMF_FUNC_DESC(), __VA_ARGS__);

#define LOG_PDEBUG(...) utils_plog(LOG_DEBUG, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_PINFO(...) utils_plog(LOG_INFO, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_PWARN(...) utils_plog(LOG_WARNING, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_PERR(...) utils_plog(LOG_ERROR, UMF_FUNC_DESC(), __VA_ARGS__);
#define LOG_PFATAL(...) utils_plog(LOG_FATAL, UMF_FUNC_DESC(), __VA_ARGS__);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a separate PR.

There are comments which i whould make - but i strongly believe that this discussion should happen in the separate PR to do not block this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// found
if (is_adding) {
utils_write_unlock(&ze_provider->resident_device_rwlock);
// impossible for UR, should be an assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho we should not assume that user will not do something then is should be an assertion. If it comes it for any kind of user it should be validated an correct error should be returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, comment removed


uint32_t existing_peer_index = 0;
utils_write_lock(&ze_provider->resident_device_rwlock);
while (existing_peer_index < ze_provider->resident_device_count &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: this code whould be much easier to read if it whould be a for loop

for (index = 0; index < count; index++) if (device[index] == index) break;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need for loop index to be available outside the loop. I don't think it is more readable. Left as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do the incrementation within the while loop and have extra && to mark the stop, I believe I agree with Łukasz - this would look better as for.

// you obviously don't have to init anything in for (e.g. for(; index < count; index++)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i noticed that you need it outside of loop - this is why i write
for(index = 0; index < count; index++) if (device[index] == index) break;
not
for (int index = 0; index < count; index++) if (device[index] == index) break;

If something is for loop we should use for not while. If you need access to loop counter outside of function you can declare it before loop - this is a standard approach which should not confuse reader.

Comment on lines 1062 to 1071
const uint32_t new_capacity =
ze_provider->resident_device_capacity * 2 +
1; // +1 to work also with old capacity == 0
ze_device_handle_t *new_handles =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this code is performance critical, or will be called multiple times

If not i would prefer to have this array just always reallocated by one,
If yes can we abstract this vector like structure to separate module, or even better just use list structure with is included in umf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, not critical, applied

.failed_changes = 0,
};

umf_result_t result = umfMemoryTrackerIterateAll(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "hacky" and it will not work if someone uses pool without tracker or just use provider without pool.

We allready solved this issue in other providers were we keep track allocation by the provider (for example please check os_provider)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told by you, that I can use tracker. Since it is not a case now that l0 provider is used without tracker, so I am only leaving a comment here that warns about what you noticed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss communicated - I will follow up on this later on.

@lslusarczyk
Copy link
Contributor Author

I still oppose against adding provider specific code to ops interface. If you don't want to do it thru CTL just implement it as umfLevelZeroMemoryProviderResidentDeviceChange() in provider_level_zero.h which do not go throught OPS. This function will act as an "level zero specific api extension".

note: remember that you will have to call umfMemoryProviderGetPriv() in this function to retrieve ze_memory_provider_t from provider handle

Great! Using umfLevelZeroMemoryProviderResidentDeviceChange simplified code even more than using ops. Applied with pleasure.

@lslusarczyk
Copy link
Contributor Author

@lukaszstolarczuk , @bratpiorka , @ldorau ,
CI passed, please check if I applied your comments correctly, resolve discussions which can be resolved and give approve if there is nothing left to do with respect to your comments

@lplewa , please do the same wrt your comments when you're back


uint32_t existing_peer_index = 0;
utils_write_lock(&ze_provider->resident_device_rwlock);
while (existing_peer_index < ze_provider->resident_device_count &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do the incrementation within the while loop and have extra && to mark the stop, I believe I agree with Łukasz - this would look better as for.

// you obviously don't have to init anything in for (e.g. for(; index < count; index++)

Copy link
Contributor

@lplewa lplewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor commits structure.

commits in PR should be split to functional parts, with proper commit message.
Each commit should work (at least it should build - but we do not check it so exception happens). You should not split implementation, tests, documentation to separate commits.

If i were you i would create(This list is an inspiration for proper commits structure, you do not have split your PR to exactly this commits) : commit for adding iterate over critnib, commit for changes in asserts, maybe commit for this mock library, commit for allowing ovveride lib_name, commit for change in setResidentDevides, commit for main functionality of this PR....


uint32_t existing_peer_index = 0;
utils_write_lock(&ze_provider->resident_device_rwlock);
while (existing_peer_index < ze_provider->resident_device_count &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i noticed that you need it outside of loop - this is why i write
for(index = 0; index < count; index++) if (device[index] == index) break;
not
for (int index = 0; index < count; index++) if (device[index] == index) break;

If something is for loop we should use for not while. If you need access to loop counter outside of function you can declare it before loop - this is a standard approach which should not confuse reader.

LOG_ERR("umfMemoryTrackerIterateAll did not manage to do some change "
"numFailed: %d, numSuccess: %d",
privData.success_changes, privData.failed_changes);
return UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When returining this error code get_last_native_error function should be able to return more precise error message - please add support for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added store_last_native_error(result); in ze_memory_provider_resident_device_change_helper

.failed_changes = 0,
};

umf_result_t result = umfMemoryTrackerIterateAll(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss communicated - I will follow up on this later on.

return 0;
}

umf_result_t umfLevelZeroMemoryProviderResidentDeviceChange(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have add and remove function - i recall pbalcer already suggested it too.

I will split it to 3 functions - add, delete, and helper function to iterate over existing allocations (so we limit code duplication)

@lslusarczyk
Copy link
Contributor Author

Please refactor commits structure.

commits in PR should be split to functional parts, with proper commit message. Each commit should work (at least it should build - but we do not check it so exception happens). You should not split implementation, tests, documentation to separate commits.

If i were you i would create(This list is an inspiration for proper commits structure, you do not have split your PR to exactly this commits) : commit for adding iterate over critnib, commit for changes in asserts, maybe commit for this mock library, commit for allowing ovveride lib_name, commit for change in setResidentDevides, commit for main functionality of this PR....

Refactor will be squashing all commits to one during merge.
Each commit will be working :)

@lplewa
Copy link
Contributor

lplewa commented Oct 6, 2025

Please refactor commits structure.
commits in PR should be split to functional parts, with proper commit message. Each commit should work (at least it should build - but we do not check it so exception happens). You should not split implementation, tests, documentation to separate commits.
If i were you i would create(This list is an inspiration for proper commits structure, you do not have split your PR to exactly this commits) : commit for adding iterate over critnib, commit for changes in asserts, maybe commit for this mock library, commit for allowing ovveride lib_name, commit for change in setResidentDevides, commit for main functionality of this PR....

Refactor will be squashing all commits to one during merge. Each commit will be working :)

We do not do squash merge in this project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants